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- The MAILING DATE of this communication appears on the cover sheet with the correspondence address 
Period for Reply 
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DETAILED ACTION 

1 . This action is in response to the amendment filed 3/7/2005. 

2. Claims 1-37 have been examined and are pending in the application. 



Claim Rejections - 35 USC § 103 

The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 



3. Claims 1-15 and 19-37 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Hashimoto U.S Patent No. 6,119,147. 

As to claim 1, Hashimoto teaches a system comprising: 
a program storage device (a computer, line 38 column 3) that stores a multi- 
modal application (combination of SRS 1 and application programs 2, Fig. 6), the multi- 
modal application comprising at least a first and a second mode process (inputs and 
outputs from the application programs, lines 23-25 column 41 ; speech synthesis unit 
operated as an independent process, lines 37-38 column 42) that enables user 
interaction with the application in a first modality and second modality (the system 
allows the user to interact with the system via text data or speech data, lines 50-63 
column 49; Figs. 17, 66, and 69-72); 
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a program execution system (speech recognition interface system, lines 11-12 
column 10) that executes the multi-modal application and synchronizes the first and 
second mode processes while a user interacts with the multi-modal application (lines 
12-37 column 10) wherein the program execution system comprises: 

a multi-modal shell (11, Fig. 56) that manages information exchanges between 
the processes (controls by exchanging the messages transmitted from the speech unit 
to the application program, lines 38-41 column 10) to enable a synchronized multi 
modal interaction with the application (the system allows the user to interact with the 
system via text data or speech data, lines 50-63 column 49; Figs. 17, 66, and 69-72) 
wherein user interaction in one modality results in execution of corresponding 
commands in both the first and second mode processes (lines 38-63 column 49); 

the processes register their respective active commands and corresponding 
actions (register the recognition vocabularies and the appropriate actions that response 
to these vocabularies, line 48 column 10 to line 10 column 1 1 ) with the multi model shell 
(11, Fig. 56). 

Hashimoto does not explitcitly teach an API. However, Hashimoto teaches that 
each application program includes a message I/O unit (line 9-29 column 12) wherein all 
of the application program interactions with the speech recognition system are handled 
by this message I/O unit. Therefore one of ordinary skill in the art would conclude that 
this message I/O unit could be used as an API since it provides the interface for each 
application program to interact with the speech recognition system. 
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As to claim 2, Hashimoto as modified further teaches a registry having a 
registration table (program management table, line 58 column 10), managed by the 
multi-modal shell (1 1 , Fig. 56), that comprises a list of each of the registered commands 
and corresponding synchronized actions (the recognition vocabularies and the 
appropriate actions that response to these vocabularies, line 48 column 10 to line 10 
column 1 1) that results in both the first and second mode processes upon execution of 
a registered command (once the speech input "Finish" is transmitted, both of the 
application program can be finished by this single speech input, lines 5-8 column 27) by 
one of the first and second mode processes (inputs and outputs from the application 
programs, lines 23-25 column 41 ). 

As to claim 3, Hashimoto as modified further teaches the multi-modal 
application comprises a first mono-mode application (speech synthesis unit, line 37 
column 42) for the first mode process and a second mono-mode application (application 
program, lines 41-42 column 42) for the second mode process, wherein the multi-modal 
shell (1 1 , Fig. 56) manages and synchronizes information exchanges (message 
exchanges using the process communication, lines 39-40 column 42) between the first 
(speech synthesis unit, line 37 column 42) and second mono-mode applications 
(application program, lines 41-42 column 42). 

As to claim 4, Hashimoto as modified does not teach devices having user 
interface modalities. However, Hashimoto teaches (lines 51-55 column 12) the 
message system can be implemented as a server and clients system wherein the 
speech recognition unit can act as a server and the applications programs are clients. 
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Therefore one of ordinary skill in the art would conclude that the clients are the devices 
with modalities wherein these devices can register their commands and corresponding 
actions with the server. 

As to claim 5, Hashimoto as modified further teaches the devices multi-modal 
shell are distributed over a network (server and client system, lines 51-55 column 12), 
and wherein the API is implemented using distributed APIs or protocols (byte stream 
type protocol, lines 54-55 column 12). 

As to claim 6, Hashimoto as modified further teaches a mechanism for 
converting a mono-mode application to a multi modal application (application program 5 
is interact with both speech input and keyboard input, Fig. 17). 

As to claim 7, Hashimoto as modified^further teaches the mono-mode 
application is a GUI application (user interface of Fig. 18), the mechanism (11, Fig. 56) 
provides speech enablement (controls by exchanging the messages transmitted from 
the speech unit to the application program, lines 38-41 column 10) of the GUI 
application (user interface of Fig. 18) by registering the active commands of the GUI 
application and building a grammar for the registered commands to support the 
commands in a speech modality (register the recognition vocabularies and the 
appropriate actions that response to these vocabularies, line 48 column 10 to line 10 
column 11). 

As to claim 8, Hashimoto as modified further teaches a mechanism for building 
a multi-modal application (application program 5 is interact with both speech input and 
keyboard input, Fig. 17). 
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As to claim 9, Hashimoto as modified further teaches the mechanism (1 1 , Fig. 
56) is used for directly programming the registry by building a registration table 
(program management table, line 58 column 10) having user-defined commands and 
corresponding actions (the recognition vocabularies and the appropriate actions that 
response to these vocabularies, line 48 column 1 0 to line 1 0 column 1 1 ) for each of the 
modalities of the multi-modal application (1A, Fig. 56). 

As to claim 10, Hashimoto as modified further teaches that the system is 
implemented on personal computers, workstations... (lines 8-12 column 1). Hashimoto 
does not explicitly disclose an operating system. "Official Notice" is taken that both the 
concept and advantage of providing for an operating system is well known and 
expected in the art. It would have been obvious to include an operating system into the 
system of Hashimoto because it would provide the execution space for the system. 

As to claim 11, Hashimoto as modified further teaches the system is distributed 
over a network (server and client system, lines 51-55 column 12). 

As to claim 12, Hashimoto as modified further teaches the multi-modal 
application (1 A, Fig. 56) is a multi-modal browser (a mail browser, Fig. 27), comprising 
first and second browser applications (multiple application programs 2, Fig. 7). 

As to claim 13, Hashimoto as modified further teaches the first browser is GUI 
(interface of Fig. 27) and the second browser is speech (the user can command to open 
the mail by saying "yes", lines 54-60 column 25). 

As to claim 14, Hashimoto as modified further teaches the multi-modal shell (1 1 , 
Fig. 56) processes the multi-modal application (recognition result, line 16 column 25) to 
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send modality specific presentation information (speech result from the speech 
recognition system, lines 14-17 column 25) to the respective browsers (a mail browser, 
Fig. 27). 

As to claim 15, Hashimoto as modified further teaches the multi-modal 
application is authored using a modality-independent description and wherein the multi- 
modal shell generates (recognition result, line 16 column 25) the modality-specific 
presentation information (speech result from the speech recognition system, lines 14-17 
column 25) from the modality-independent description (input speech, line 12 column 
25). 

As to claim 19, Hashimoto teaches a method comprising: 
activating a multi-modal application (combination of SRS 1 and application 
programs 2, Fig. 6) comprising at least a first mode process (inputs and outputs from 
the application programs, lines 23-25 column 41 ; speech synthesis unit operated as an 
independent process, lines 37-38 column 42) that enables user interaction with the 
application in a first modality (the system allows the user to interact with the system via 
text data or speech data, lines 50-63 column 49; Figs. 17, 66, and 69-72) and a second 
mode process (inputs and outputs from the application programs, lines 23-25 column 
41 ; speech synthesis unit operated as an independent process, lines 37-38 column 42) 
that enables user interaction with the application in a second modality (the system 
allows the user to interact with the system via text data or speech data, lines 50-63 
column 49; Figs. 17, 66, and 69-72); 
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receiving a command (speech input, line 14 column 41) in a first modality 
(speech recognition interface system, line 13 column 41); 

triggering an action (speech output, line 19 column 41) in the first mode process 
based on the received command (speech recognition interface system, line 13 column 
41 ) and triggering a corresponding action (the mail is opened when the user saying 
"yes", lines 54-60 column 25) by the second mode process (application program, lines 
41-42 column 42); 

updating application state associated with the second mode process (update the 
program management data in the program management according to an internal state 
of each application program, lines 35-39 column 76). 

Hashimoto does not explitcitly teach updating application state associated with 
the first mode process. However, Hashimoto teaches (lines 37-46 column 14) the 
speech recognition system also has its own internal state; and when there is a change 
in this internal state, the application program will get a message notifies about this 
change. Therefore one of ordinary skill in the art would conclude that in a certain time, 
the internal state of the speech recognition system is updated; therefore produces the 
state change. 

As to claim 20, Hashimoto as modified further teaches registering active 
commands associated with the first and second mode processes (the recognition 
vocabularies and the appropriate actions that response to these vocabularies, line 48 
column 10 to line 10 column 11); associating, with each registered command of the 
mode processes, an action on one mode process and a corresponding action on the 
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other mode process (once the speech input "Finish" is transmitted, both of the 
application program can be finished by this single speech input, lines 5-8 column 27). 

As to claim 21, Hashimoto as modified further teaches building a command to 
action registration table (program management table, line 58 column 10) based on the 
registered commands and actions (the recognition vocabularies and the appropriate 
actions that response to these vocabularies, line 48 column 10 to line 10 column 11). 

As to claim 22, Hashimoto as modified further teaches the registration table 
(program management table, line 58 column 10) is built by a multi-modal shell (1 1 , Fig. 
56). Hashimoto does not explitcitly teach an API. However, Hashimoto teaches that 
each application program includes a message I/O unit (line 9-29 column 12) wherein all 
of the application program interactions with the speech recognition system are handled 
by this message I/O unit. Therefore one of ordinary skill in the art would conclude that 
this message I/O unit could be used as an API since it provides the interface for each 
application program to interact with the speech recognition system. 

As to claim 23, Hashimoto as modified further teaches looking up the received 
command (the recognition vocabulary lists, lines 59-60 column 10) in the registration 
table (program management table, line 58 column 10); and executing the actions 
associated with the received command on the first and second mode processes (once 
the speech input "Finish" is transmitted, both of the application program can be finished 
by this single speech input, lines 5-8 column 27). 

As to claim 24, Hashimoto as modified further teaches registering a callback 
handle for each of the registered commands to notify the first and second mode 
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processes of completion of the actions corresponding to the registered commands (after 
the dictionary production is completed, notifies this act to the data acquisition unit 8 by a 
message indicating the completion of the dictionary production, lines 3-5 column 33). 

As to claim 25, Hashimoto as modified further teaches executing the callback 
handle associated with the received command to trigger callback actions on the mode 
processes (after the dictionary production is completed, notifies this act to the data 
acquisition unit 8 by a message indicating the completion of the dictionary production, 
lines 3-5 column 33). 

As to claim 26, Hashimoto as modified further teaches executing first thread 
associated with the received command ("Finish", line 6 column 27); triggering a 
corresponding second thread to initiate the corresponding action on the second mode 
process (once the speech input "Finish" is transmitted, both of the application program 
can be finished by this single speech input, lines 5-8 column 27). 

As to claim 27, Hashimoto as modified further teaches the threads are applets 
("Yes" and "No" icons in Fig. 27). 

As to claim 28, Hashimoto as modified further teaches the threads communicate 
via socket connections (server and client system (lines 51-55 column 12) with a mail 
browser, Fig. 27)). 

As to claims 29-37, they are device claims of claims 19-21 and 23-28, 
respectively. Therefore, they are rejected for the same reasons as claims 19-21 and 
23-28 above. 
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4. Claims 16-18 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Hashimoto in view of Toomey U.S Patent No. 6,119,147. 

As to claim 16, Hashimoto as modified does not explitcitly teach the multi-modal 
application comprises a combination of declarative markup languages. Toomey 
teaches a multi-model document with a combination of declarative markup languages 
(text discussion, audio commands, graphics, and documents, lines 60-61 column 1). It 
would have been obvious to apply the teachings of Toomey to the system of Hashimoto 
because this document will provide the user with the convenience of interacting with the 
system using the choice of texts or speech commands. 

As to claim 17, Toomey further discloses combining the declarative markup 
languages and synchronization elements to provide tight synchronization (interactions 
are inserted into the multi-modal document at a point that is chronological in the 
meeting to create a synchronous meeting, lines 11-13 column 1 5). 

As to claim 18, Toomey further discloses separate files for each of the 
declarative markup languages (multiple tracks in the multi-modal document, lines 59-60 
column 1). 

Response to Arguments 

5. Applicant's arguments filed 3/7/2005 have been fully considered but they are 
not persuasive. 

Applicant argued that Hashimoto reference does not teach a multi-modal 
application (Remarks, fourth paragraph page 9). In response, as disclosed in the claims 
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rejection above, Hashimoto teaches the system allows the user to interact with the 
application via text data or speech data (lines 50-63 column 49; Figs. 17, 66, and 69- 
72). The SRS 1 (Fig. 6) acts as an interface of the application wherein this interface 
transfers user's inputs into recognizable results before sending them to the application 
(lines 12-37 column 10). Therefore, the SRS and the application together made up a 
multi-modal application. The reference meets the limitation as claimed. 

Applicant argued that Hashimoto reference does not teach executing the 
commands for the application (Remarks, first, second and third complete paragraphs 
page 10). In response, as clearly disclosed in the claim rejection above, Hashimoto 
teaches triggering an action (speech output, line 19 column 41 ) in the first mode 
process based on the received command (speech recognition interface system, line 13 
column 41 ) and triggering a corresponding action (the mail is opened when the user 
saying "yes", lines 54-60 column 25) by the second mode process (application program, 
lines 41-42 column 42). The reference meets the limitation as claimed. 

Conclusion 

The prior art made of record and not relied upon is considered pertinent to 
applicant's disclosure. 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Andy Ho whose telephone number is (571 ) 272-3762. 
A voice mail service is also available for this number. The examiner can normally be 
reached on Monday - Friday, 8:30 am - 5:00 pm. 
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Any inquiry of a general nature or relating to the status of this application or 
proceeding should be directed to the receptionist whose telephone number is 571-272- 
2100. 

Any response to this action should be mailed to: 
Commissioner for Patents 
P.O Box 1450 

Alexandria, VA 22313-1450 
Or fax to: 

• AFTER-FINAL faxes must be signed and sent to (703) 872 - 9306. 

• OFFICAL faxes must be signed and sent to (703) 872 - 9306. 

• NON OFFICAL faxes should not be signed, please send to (571 ) 273 - 3762 



A.H 

May 25, 2005 
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